Search CORE

35 research outputs found

Toward Stance-based Personas for Opinionated Dialogues

Author: Guerini Marco
Scialom Thomas
Staiano Jacopo
Tekiroglu Serra Sinem
Publication venue
Publication date: 01/01/2020
Field of study

In the context of chit-chat dialogues it has been shown that endowing systems with a persona profile is important to produce more coherent and meaningful conversations. Still, the representation of such personas has thus far been limited to a fact-based representation (e.g. "I have two cats."). We argue that these representations remain superficial w.r.t. the complexity of human personality. In this work, we propose to make a step forward and investigate stance-based persona, trying to grasp more profound characteristics, such as opinions, values, and beliefs to drive language generation. To this end, we introduce a novel dataset allowing to explore different stance-based persona representations and their impact on claim generation, showing that they are able to grasp abstract and profound aspects of the author persona.Comment: Accepted at Findings of EMNLP 202

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Author: Dray Paul-Alexis
Lamprier Sylvain
Piwowarski Benjamin
Scialom Thomas
Staiano Jacopo
Publication venue
Publication date: 08/06/2020
Field of study

Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called exposure bias, exacerbated by considering only the reference texts as correct, while in practice several alternative formulations could be as good. Generative Adversarial Networks (GANs) can mitigate those limitations but the discrete nature of text has hindered their application to language generation: the approaches proposed so far, based on Reinforcement Learning, have been shown to underperform MLE. Departing from previous works, we analyze the exploration step in GANs applied to text generation, and show how classical sampling results in unstable training. We propose to consider alternative exploration strategies in a GAN framework that we name ColdGANs, where we force the sampling to be close to the distribution modes to get smoother learning dynamics. For the first time, to the best of our knowledge, the proposed language GANs compare favorably to MLE, and obtain improvements over the state-of-the-art on three generative tasks, namely unconditional text generation, question generation, and abstractive summarization

arXiv.org e-Print Archive

Hal-Diderot

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Author: Keraron Rachel
Riabi Arij
Sagot Benoît
Scialom Thomas
Seddah Djamé
Staiano Jacopo
Publication venue
Publication date: 14/10/2021
Field of study

Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).Comment: 7 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Augmented Language Models: a Survey

Author: Celikyilmaz Asli
Dessì Roberto
Dwivedi-Yu Jane
Grave Edouard
LeCun Yann
Lomeli Maria
Mialon Grégoire
Nalmpantis Christoforos
Pasunuru Ram
Raileanu Roberta
Rozière Baptiste
Schick Timo
Scialom Thomas
Publication venue
Publication date: 15/02/2023
Field of study

This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm. We therefore refer to them as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks. In this work, after reviewing current advance in ALMs, we conclude that this new research direction has the potential to address common limitations of traditional LMs such as interpretability, consistency, and scalability issues

arXiv.org e-Print Archive